Compression Forensics Beyond The First Significant Digit

نویسندگان

  • Sujoy Chakraborty
  • Matthias Kirchner
چکیده

We study characteristics of the second significant digits of blockDCT coefficients computed from digital images. Following previous works on compression forensics based on first significant digits, we examine the merits of stepping towards significant digits beyond the first. Our empirical findings indicate that certain block-DCT modes follow Benford’s law of second significant digits extremely well, which allows us to distinguish between never-compressed images and decompressed JPEG images even for the highest JPEG compression quality of 100. As for multiple-compression forensics we report that second significant digit histograms are highly informative on their own, yet cannot further improve already good performances of classification schemes that work with first significant digits alone. Introduction The wide-spread use of the JPEG compression standard for still images makes the analysis of JPEG images a major branch of media forensics [1]. Most digital cameras store images in JPEG format, typically using customized compression or file format settings [2, 3], and most likely images will be stored as JPEG again using a different set of compression settings after processing or manipulation of any sort. This introduces various forms of requantization artifacts, which may be exploited to infer the compression history of a given image. Scenarios vary greatly, including for instance the detection and characterization of previous JPEG compressions of images stored in bitmap formats [4, 5], the detection and characterization of multiple JPEG compressions [6–11], possibly in the presence of other forms of processing in between consecutive compression steps [12, 13], or the detection of local image manipulations [9, 14–16]. Among the most successful approaches for multiple compression forensics are those that rely on statistics of the first significant digits (FSDs) of discrete cosine transform (DCT) coefficients. It has been observed that block-DCT coefficients of natural images follow (a generalized) Benford’s law of first significant digits [17,18], while lossy JPEG compression changes the FSD distribution of DCT coefficients. The analysis of empirical FSD histograms is effectively equivalent to aggregating statistics over DCT coefficient histograms. Hence, FSD-based approaches promise more compact feature representations, yet they cannot be expected to fundamentally outperform forensic techniques that work with first-order DCT coefficient statistics directly [19]. Indeed, the literature has demonstrated that FSD-based classifiers are highly capable of identifying the number of compression cycles an image underwent in a relatively low-dimensional feature space [10]. It has been noted only recently that also the second significant digits (SSDs) of block-DCT coefficients exhibit a highly regular behavior across different image databases [20], partially in correspondence with Benford’s law. Here, we follow this path and explore to what degree peculiarities in the distribution of blockDCT SSDs can be exploited for compression forensics. While it cannot be expected that switching from first significant digits to higher-order digits will result in tremendous performance gains, it is our hope to deepen the understanding of how significant digits of DCT coefficients behave in various forensically relevant settings, and whether there exist scenarios for which it is beneficial to consider also significant digits beyond the first. Before we delve into our exploratory analysis, the following two sections briefly summarize how Benford’s law characterizes the empirical distribution of significant digits and how FSD features have been utilized in prior work on compression forensics. Significant Digits and Benford’s Law Denoting s(x) as the decimal1 significand of a non-zero real number x ∈ R \ {0}, s(x) = 10log |x|−blog |x|c, the s-th significant digit of x, ds(x), s ∈ N, is given as ds(x) = ⌊ 10s−1s(x) ⌋ −10 ⌊ 10s−2s(x) ⌋ . (1) By definition, the first significant digit (FSD) of x 6= 0 is never zero, d1(x) ∈ [1 ..9]. Significant digits ds(x) with s > 1 can also take on value zero. Benford’s law [21, 22] concerns the statistical distribution of significant digits. It has been found to apply to various types of synthetic and empirical data. Denoting Pr(Ds = ds) as the probability that the s-th significant digit equals ds, Benford’s law states that [22] Pr ( (D1,D2, . . . ,Dr) = (d1,d2, . . . ,dr) ) = log ( 1+ ( ∑s=1 10 r−sds )−1) . (2) Specifically, this implies for first and second significant digits that Pr ( D1 = d1 ) = log(1+1/d1) and (3) Pr ( D2 = d2 ) = 9 ∑ d1=1 log(1+1/(10d1 +d2)) , (4) respectively. Figure 1 gives a graphical representation of the two probability mass functions. Observe that the SSD distribution is much more uniform than the FSD distribution. Generally speaking, it follows from Benford’s law that the marginal distribution of the s-th significant digit approaches the uniform distribution as s→∞ [23]. A sufficient condition for Benford’s law to be satisfied is a uniform distribution of logs(x) over the interval [0,1) [22]. FSD-Based JPEG Compression Forensics First significant digits of block-DCT modes of natural images are commonly assumed to obey (a generalized variant of) Benford’s law [17, 18]. A common working assumption of forensic 1We work with logarithms to base 10 in this paper. 0 1 2 3 4 5 6 7 8 9 0.05 0.10 0.15 0.20 0.25 0.30 digit d Pr (D = d FSD SSD Figure 1. Benford’s law for first (FSD) and second (SSD) significant digits. techniques is that FSD distributions after lossy JPEG compression exhibit a fundamentally different behavior. By analyzing empirical FSD histograms, this can be exploited to detect traces of prior JPEG compressions in decompressed images or to determine the number and parameters of previous JPEG compression cycles. Early methods leaned towards explicit tests for Benford’s law to be satisfied [18,24]. More recent techniques draw more heavily on machine learning support. Most recently, Milani et al. [10] determined empirically a set of FSD features for multiple compression forensics by assessing a large number of possible digit combinations from a predetermined set of nine low-frequency JPEG coefficients proposed in [25]. Specifically, the authors focussed on fixed n-tuples of digits and considered the corresponding n×9 FSD histogram bins as feature space. Digits 2, 5 and 6 were found to work particularly well. The classification algorithm combines a set of binary classifiers, individually trained to distinguish between various compression settings one at a time. The final classification result is then aggregated from the combined binary decisions. Experiments With Block-DCT SSDs Positive reports about FSD-based compression forensics in the literature have also led to a number of counter-forensic techniques [27] that attempt to restore block-DCT coefficient FSD histograms after JPEG compression [28, 29]. Yet we demonstrated in a recent work [20] that restoration algorithms under a minimum cost constraint can lead to detectable artifacts in the histograms of second significant digits. One of the major findings along the way was that block-DCT SSD histograms from never-compressed images exhibit a highly regular behavior. A question that naturally arises is thus whether SSD histogram features are similarly suitable for compression forensics as FSD features. Experimental Setup We work with the 1338 UCID images [26] in our experiments. All images are of size 384×512. The images were converted to grayscale before further processing.2 We use the Independent JPEG Group reference library with floating-point DCT implementation and standard quantization tables to obtain JPEG versions of the database, considering single compression with quality factors from the set Q = {35,40, . . . ,100} and double compression with quality factor combinations in Q×Q. For a given tuple q = (q1,q2) ∈Q×Q, we use notation q1 → q2 to refer to compression with quality factor q1 followed by a second compression 2ImageMagick convert with option -grayscale Rec601Luma. with quality factor q2. Similar to Milani et al. [10], we work with a simplified setup that limits the set of primary quality factors to Q∆ q2 = {q1 ∈Q : 0 < |q2−q1| ≤ ∆} , (5) i. e., for a specific choice of q2, we consider only quality factor combinations (q1,q2) ∈Q∆ q2 ×{q2}. We analyze DCT coefficients, rounded to six digits, which are computed from non-overlapping 8× 8 pixel blocks (ym,n), 0≤ m,n≤ 7, with integer intensities ym,n ∈ [0 ..255] as xi, j = 7 ∑ m=0 7 ∑ n=0 (ym,n−128) ·b j) m,n . (6) The elements of the (i, j)-th basis vector b(i, j) are given as b j) m,n = cic j 4 · cos ( πi(2m+1) 16 ) · cos ( π j(2n+1) 16 ) , (7) where 0≤ i, j ≤ 7, c0 = 1/ √ 2 and ci = 1 for i > 0. Pixel blocks are aligned with the JPEG grid, if the image was previously stored as JPEG. Normalized FSD and SSD histograms are computed per DCT mode (i, j). Omitting index (i, j) for the sake of brevity, and denoting x = (xk), 0≤ k < N, as the vector of coefficients satisfying {x : x 6= 0∧ x 6= d1(x)}, the s-th digit normalized histogram is hs(d,x) = 1 N N−1 ∑ k=0 δ (d−ds(xk)) , (8) where δ (·) denotes the Kronecker delta function. Never-compressed Images Figure 2 illustrates the distribution of second significant digits obtained from block-DCT modes of never-compressed images [20]. Each of the 8×8 sub-graphs depicts the median SSD histogram (aggregated over all images in the UCID database) in correspondence to the DCT coefficient index. Plots of the “ideal” distribution according to Benford’s law in Equation (4) are given as reference with each empirical histogram. The figure indicates that the majority of DCT modes adhere to Benford’s law strikingly well. There are three strong outliers at DCT coefficient indices (i, j) ∈ {(4,0),(0,4),(4,4)}. This can be explained by the special form of the DCT for those frequencies. As pointed out in [30] in the context of steganography, it is straightforward to verify that Equation (7) evaluates to b j) m,n =± 8 for all m,n when (i, j) ∈ {(4,0),(0,4),(4,4)}. This implies that DCT coefficients computed from integer pixel intensities will be exclusively integer multiples of 8 at those frequencies, i. e., second significant digits {0,4,9} can only occur for coefficients {x : |x|> 10}. Considering the Laplacian-like distribution of DCT coefficients, these SSD histogram bins will be populated only minimally. Similar but by far less strong artifacts can also be observed for other coefficient indices with even row and column indices [20]. Decompressed JPEG Images Figure 3 continues with a closer look at block-DCT SSD distributions after JPEG compression and decompression. Specifically, we plot the empirical histograms aggregated over all UCID images after compression with quality factor 100. The histograms

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Estimating JPEG2000 Compression for Image Forensics Using the Benford’s Law

With the tremendous growth and usage of digital images nowadays, the integrity and authenticity of digital content is becoming increasingly important, and a growing concern to many government and commercial sectors. Image Forensics, based on a passive statistical analysis of the image data only, is an alternative approach to the active embedding of data associated with Digital Watermarking. Ben...

متن کامل

Image compression using anti-forensics method

A large number of image forensics methods are available which are capable of identifying image tampering. But these techniques are not capable of addressing the anti-forensics method which is able to hide the trace of image tampering. In this paper anti-forensics method for digital image compression has been proposed. This anti-forensics method is capable of removing the traces of image compres...

متن کامل

Forensics, Anti-forensics and Counter Anti-forensics for JPEG Compressed Images

-With the advancement in information technology and image processing software the manipulation of the images has increased considerably from past few decades. Retouching or tampering of the images has been so prevalent that nowadays one can hardly believe their originality. Image authenticity is done using forgery detection techniques. These methods are broadly classified as active and passive....

متن کامل

Double JPEG compression forensics based on a convolutional neural network

Double JPEG compression detection has received considerable attention in blind image forensics. However, only few techniques can provide automatic localization. To address this challenge, this paper proposes a double JPEG compression detection algorithm based on a convolutional neural network (CNN). The CNN is designed to classify histograms of discrete cosine transform (DCT) coefficients, whic...

متن کامل

Robust Contrast Enhancement Forensics Using Convolutional Neural Networks

Contrast enhancement(CE) forensics has always been attracted widely attention on image forensic community. It can provide an effective rule for recovering image history and identifying and locating tampered images. Although, a number of algorithms have been proposed, the robustness of CE forensics technique for common pre/post processings is not satisfactory. To deal with it, in this letter, we...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016